CS525 - Advanced Database Organization - 2020 Fall

Course webpage for CS525 - 2020 Fall taught by Boris Glavic

Assignment 2 - Buffer Manager

You should implement a buffer manager in this assignment. The buffer manager manages a fixed number of pages in memory that represent pages from a page file managed by the storage manager implemented in assignment 1. The memory pages managed by the buffer manager are called page frames or frames for short. We call the combination of a page file and the page frames storing pages from that file a Buffer Pool. The Buffer manager should be able to handle more than one open buffer pool at the same time. However, there can only be one buffer pool for each page file. Each buffer pool uses one page replacement strategy that is determined when the buffer pool is initialized. You should at least implement two replacement strategies FIFO and LRU. Your solution should implement all the methods defined in the buffer_mgr.h header explained below.

Make use of existing debugging and memory checking tools. At some point you will have to debug an error. See the main assignment page for information about debugging. Memory leaks are errors!

Buffer Pool Functionality and Concepts

A buffer pool consists of a fixed amount of page frames (pages in memory) that are used to store disk pages from a page file in memory. Clients of the buffer manager can request pages identified by their position in the page file ( page number) to be loaded in a page frame. This is called pinning a page. Internally, the buffer manager has to check whether the page requested by the client is already cached in a page frame. If this is the case, then the buffer simply returns a pointer to this page frame to the client. Otherwise, the buffer manager has to read this page from disk and decide in which page frame it should be stored (this is what the replacement strategy is for). Once an appropriate frame is found and the page has been loaded, the buffer manager returns a pointer to this frame to the client. The client can then start to read and/or modify that page. Once the client is done with reading or writing a page, he needs to inform the buffer manager that he no longer needs that page. This is called unpinning. Furthermore, the buffer manager needs to know whether the page was modified by the client. This is realized by requiring the client to call a function to tell the buffer manager that the page is dirty. The buffer needs this information for replacing pages in the buffer pool. If a dirty page is evicted from the buffer pool, then the buffer manager needs to write the content of this page back to disk. Otherwise, the modifications done by the client would be lost. Since buffer pools are used concurrently by several components of a DBMS, the same page can be pinned by more than one client. Making the functions of the buffer manager thread-safe is not part of the assignment. The number of clients having pinned a page is called the fix count of that page. The buffer manager can only evict pages with fix count 0 from the pool, because a non-zero fix count indicates that at least one client is still using the page. Pinning a page increases its fix count by 1, unpinning the page reduces its fix count.

Some hints and reminders:

  • Independent of the page replacement strategy, the buffer manager is only allowed to evict pages with fix count zero. This has to be taken into account when implementing page replacement strategies.

  • Dirty pages can be evicted from the pool if they have a fix count 0, but have to be written back to disk before the eviction

  • If a dirty page is written back to disk and has fix count 0, then it is no longer considered dirty.

  • You buffer manager needs to maintain a mapping between page numbers and page frames to enable fast look-ups from page number to page frame and vice versa.

Optional Extensions

Realize these optional extensions for extra credit and extra fun. ;-)

  • Make the buffer pool functions thread safe. This extension would result in your buffer manager being closer to real life buffer manager implementations.

  • Implement additional page replacement strategies such as CLOCK or LRU-k.

Interface

The header for the buffer manager interface is shown below. Your solution should implement all functions defined in this header.

Data Structures

The header defines two important data structures. The BM_BufferPool and the BM_PageHandle.

The BM_BufferPool stores information about a buffer pool: the name of the page file associated with the buffer pool ( pageFile), the size of the buffer pool, i.e., the number of page frames ( numPages), the page replacement strategy ( strategy), and a pointer to bookkeeping data ( mgmtData). Similar to the first assignment, you can use the mgmtData to store any necessary information about a buffer pool that you need to implement the interface. For example, this could include a pointer to the area in memory that stores the page frames or data structures needed by the page replacement strategy to make replacement decisions.

1
2
3
4
5
6
typedef struct BM_BufferPool {
  char *pageFile;
  int numPages;
  ReplacementStrategy strategy;
  void *mgmtData;
} BM_BufferPool;

The BM_PageHandle stores information about a page. The page number (position of the page in the page file) is stored in pageNum. The page number of the first data page in a page file is 0. The data field points to the area in memory storing the content of the page. This will usually be a page frame from your buffer pool.

1
2
3
4
typedef struct BM_PageHandle {
  PageNumber pageNum;
  char *data;
} BM_PageHandle;

Buffer Pool Functions

These functions are used to create a buffer pool for an existing page file ( initBufferPool), shutdown a buffer pool and free up all associated resources ( shutdownBufferPool), and to force the buffer manager to write all dirty pages to disk ( forceFlushPool).

initBufferPool creates a new buffer pool with numPages page frames using the page replacement strategy strategy. The pool is used to cache pages from the page file with name pageFileName. Initially, all page frames should be empty. The page file should already exist, i.e., this method should not generate a new page file. stratData can be used to pass parameters for the page replacement strategy. For example, for LRU-k this could be the parameter k.

shutdownBufferPool destroys a buffer pool. This method should free up all resources associated with buffer pool. For example, it should free the memory allocated for page frames. If the buffer pool contains any dirty pages, then these pages should be written back to disk before destroying the pool. It is an error to shutdown a buffer pool that has pinned pages.

forceFlushPool causes all dirty pages (with fix count 0) from the buffer pool to be written to disk.

Page Management Functions

These functions are used pin pages, unpin pages, mark pages as dirty, and force a page back to disk.

pinPage pins the page with page number pageNum. The buffer manager is responsible to set the pageNum field of the page handle passed to the method. Similarly, the data field should point to the page frame the page is stored in (the area in memory storing the content of the page).

unpinPage unpins the page page. The pageNum field of page should be used to figure out which page to unpin.

markDirty marks a page as dirty.

forcePage should write the current content of the page back to the page file on disk.

Statistics Functions

These functions return statistics about a buffer pool and its contents. The print debug functions explained below internally use these functions to gather information about a pool.

The getFrameContents function returns an array of PageNumbers (of size numPages) where the ith element is the number of the page stored in the ith page frame. An empty page frame is represented using the constant NO_PAGE.

The getDirtyFlags function returns an array of bools (of size numPages) where the ith element is TRUE if the page stored in the ith page frame is dirty. Empty page frames are considered as clean.

The getFixCounts function returns an array of ints (of size numPages) where the ith element is the fix count of the page stored in the ith page frame. Return 0 for empty page frames.

The getNumReadIO function returns the number of pages that have been read from disk since a buffer pool has been initialized. You code is responsible to initializing this statistic at pool creating time and update whenever a page is read from the page file into a page frame.

getNumWriteIO returns the number of pages written to the page file since the buffer pool has been initialized.

buffer_mgr.h

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#ifndef BUFFER_MANAGER_H
#define BUFFER_MANAGER_H

// Include return codes and methods for logging errors
#include "dberror.h"

// Include bool DT
#include "dt.h"

// Replacement Strategies
typedef enum ReplacementStrategy {
  RS_FIFO = 0,
  RS_LRU = 1,
  RS_CLOCK = 2,
  RS_LFU = 3,
  RS_LRU_K = 4
} ReplacementStrategy;

// Data Types and Structures
typedef int PageNumber;
#define NO_PAGE -1

typedef struct BM_BufferPool {
  char *pageFile;
  int numPages;
  ReplacementStrategy strategy;
  void *mgmtData; // use this one to store the bookkeeping info your buffer
                  // manager needs for a buffer pool
} BM_BufferPool;

typedef struct BM_PageHandle {
  PageNumber pageNum;
  char *data;
} BM_PageHandle;

// convenience macros
#define MAKE_POOL()                 \
  ((BM_BufferPool *) malloc (sizeof(BM_BufferPool)))

#define MAKE_PAGE_HANDLE()              \
  ((BM_PageHandle *) malloc (sizeof(BM_PageHandle)))

// Buffer Manager Interface Pool Handling
RC initBufferPool(BM_BufferPool *const bm, const char *const pageFileName,
          const int numPages, ReplacementStrategy strategy,
          void *stratData);
RC shutdownBufferPool(BM_BufferPool *const bm);
RC forceFlushPool(BM_BufferPool *const bm);

// Buffer Manager Interface Access Pages
RC markDirty (BM_BufferPool *const bm, BM_PageHandle *const page);
RC unpinPage (BM_BufferPool *const bm, BM_PageHandle *const page);
RC forcePage (BM_BufferPool *const bm, BM_PageHandle *const page);
RC pinPage (BM_BufferPool *const bm, BM_PageHandle *const page,
        const PageNumber pageNum);

// Statistics Interface
PageNumber *getFrameContents (BM_BufferPool *const bm);
bool *getDirtyFlags (BM_BufferPool *const bm);
int *getFixCounts (BM_BufferPool *const bm);
int getNumReadIO (BM_BufferPool *const bm);
int getNumWriteIO (BM_BufferPool *const bm);

#endif

Error handling and Printing buffer and page content

The initial assign2 folder contains code implementing several helper functions.

buffer_mgr_stat.h and buffer_mgr_stat.c

buffer_mgr_stat.h provides several functions for outputting buffer or page content to stdout or into a string. The implementation of these functions is provided so you do not have to implement them yourself. printPageContent prints the byte content of a memory page. printPoolContent prints a summary of the current content of a buffer pool. The format looks like that:

  {FIFO 3}: [0 0],[3x5],[2 1]

FIFO is the page replacement strategy. The number following the strategy is the size of the buffer pool (number of page frames). Each part enclosed in [] represents one buffer frame. The first number is the page number for the page that is currently stored in this buffer frame. The "x" indicates that the page is dirty, i.e., it has to be written back to disk before it can be replaced with another page. The last number is the fix count. For example, in the buffer shown above the first frame stores the disk page 0 with a fix count of 0. The second page frame stores the disk page 3 with a fix count of 5 and this page is dirty.

dberror.h and dberror.c

The dberror.h header defines error codes and provides a function to print an error message to stdout.

Source Code Structure

You source code directories should be structured as follows. You should reuse your existing storage manager implementation. So before you start to develop, please copy your storage manager implementation from assign1 to assign2.

Put all source files in a folder assign2 in your git repository

This folder should contain at least …

  • the provided header and C files

  • a make file for building your code Makefile

  • a bunch of *.c and *.h files implementing the buffer manager

  • README.txt / README.md: A markdown or text file with a brief description of your solution

E.g., the structure may look like that:

   git
      assign2
          README.md
          Makefile
          buffer_mgr.h
          buffer_mgr_stat.c
          buffer_mgr_stat.h
          dberror.c
          dberror.h
          dt.h
          storage_mgr.h
          test_assign2_1.c
          test_assign2_2.c
          test_helper.h

Test Cases

test_assign2_1.c

This file implements several test cases using the buffer_mgr.h interface using the FIFO strategy. Please let your make file generate a test_assign2_1 binary for this code. This test also tests the LRU strategy. You are encouraged to extend it with new test cases or use it as a template to develop your own test files.